Researching | YJXS Topics

Topics

Design of heterogeneous FPGA hardware accelerator based on CNN

Haolin JI, Wei XU, Yongjie PIAO, Xiaobin WU, and Tan GAO

Due to limitations in hardware platform computing power and storage resources, achieving energy-efficient and efficient convolutional neural networks (CNNs) by using embedded systems remains a primary challenge for hardware designers. In this context, a complete design of a heterogeneous embedded system implemented by using a system-on-chip (SoC) with a field-programmable gate array (FPGA) is proposed. This design adopts a cascaded input multiplexing structure, enabling two independent multiply-accumulate operations in a single DSP, reducing external memory access, enhancing system efficiency, and lowering power consumption. Compared to other designs, the power efficiency is improved by over 38.7%. The design framework is successfully deployed in a large-scale CNN network on low-cost devices, significantly improving power efficiency of the network model. The power efficiency achieved on the ZYNQ XC7Z045 device can even reach 102 Gops/W. Furthermore, when inferring the VGG-16’s CONV layers by using this framework, a frame rate of up to 10.9 fps is achieved, which demonstrates the framework’?s effective acceleration of CNN inference in power-constrained environments.

Chinese Journal of Liquid Crystals and Displays

Mar. 05, 2025, Vol. 40 Issue 3 448 (2025)

Get PDF